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A.  STATEMENT  OF  THE  PROBLEM  STUDIED 


The  objectives  of  this  research  are  to  develop  efficient  parallel  algorithms  for  VLSI  CAD  tasks  that  can  utilize 
the  computing  power  of  a  wide  range  of  parallel  platforms  that  are  becoming  available  to  the  community, 
with  the  eventual  goal  of  reducing  the  design  turnaround  time  of  complex  chips  of  the  future. 

In  this  research  we  have  investigated  parallel  algorithms  for  placement,  routing,  layout  verification  and 
extraction,  logic  synthesis,  test  generation,  and  fault  simulation,  and  behavioral  simulation.  The  parallel 
algorithms  have  been  designed  such  that  they  are  portable  across  a  range  of  parallel  machines,  including 
multiprocessor  workstations,  shared  memory  multiprocessors,  message  passing  multiprocessors,  and  networks 
of  workstations.  The  algorithms  have  been  designed  to  run  on  top  of  the  ProperCAD2  C++  library  as  well 
as  the  Message  Passing  Interface  (MPI). 

B.  SUMMARY  OF  MOST  IMPORTANT  RESULTS 

Accomplishment  1 

Our  parallel  algorithms  for  placement  are  based  on  simulated  annealing.  Several  parallel  algorithms  have 
been  pursued.  The  ProperPLACE-PM  algorithm  is  based  on  having  different  processors  perform  moves  in 
parallel  on  different  cells  in  the  design.  The  cells  are  partitioned  among  the  processors,  but  can  be  moved 
anywhere  in  the  chip  image.  A  second  strategy  ProperPLACE-SC  is  to  use  speculative  execution  on  the 
moves.  The  idea  is  to  view  the  simulated  annealing  algorithm  as  a  set  of  moves  that  are  accepted  and 
rejected.  Each  processor  pursues  speculatively  to  assume  that  some  sequence  of  moves  will  be  accepted  or 
rejected.  A  third  strategy  called  ProperPLACE-MMC  is  based  on  multiple  Markov  chains.  The  idea  is  to 
have  different  processors  perform  completely  independent  simulated  annealing  with  different  rcindom  seeds 
and  periodically  exchange  the  solutions.  We  have  obtained  portable  parallel  implementations  on  each  of 
the  above  approaches.  Speedups  of  6.5  on  8  processors  have  been  obtained  using  the  ProperPLACE-MMC 
algorithm  with  less  than  3%  degradation  in  quality  of  solutions.  Speedups  of  4.5  on  8  processors  have  been 
obtained  with  the  ProperPLACE-PM  schemes  with  about  5-10%  degradation  in  quality  of  solutions.  No 
speedups  have  been  obtained  with  the  ProperPLACE-SC  scheme  since  the  cost  of  communication  dominated 
the  computation.  In  addition  to  the  above  algorithms,  a  new  circuit  partitioned  algorithm  for  standard  cell 
placement,  called  ProperPLACE-PART,  has  been  developed  which  can  run  on  very  large  problems  that  do 
not  run  on  a  single  processor  due  to  memory  limitations.  Each  processor  performs  annealing  b^lsed  moves 
on  cells  in  its  current  set  of  cells  but  moves  them  zJl  across  the  circuit.  Periodically,  the  cell  partitioning 
across  the  processors  is  changed.  This  circuit-partitioned  approach  provides  speedups  to  larger  number  of 
processors  with  little  loss  of  qua  lity  with  excellent  memory  scalability.  Speedups  of  about  6  on  8  processors 
of  a  SPARCSERVER-1000  and  about  11  on  32  processors  of  a  CM-5  have  been  measured  for  the  algorithm. 
The  quality  of  the  solutions  of  the  parallel  algorithm  are  within  5%  of  the  sequential  algorithms.  The  strength 
of  this  memory  scalable  algorithm  is  that  one  can  run  placement  on  circuits  that  do  not  fit  on  the  memory 
of  a  single  processor. 

Accomplishment  2 

We  have  developed  various  parallel  algorithms  for  global  routing  of  standard  cell  designs  based  on  the 
Timberwolf  6.0  global  router.  We  have  developed  three  different  approaches  for  parallelizing  the  global 
routing  problem.  The  first  approach  partitions  the  nets  across  the  processors,  and  each  processor  performs 
a  global  routing  in  parallel.  The  second  approach  partitions  the  chip  area  among  the  processors  by  rows, 
and  each  processor  performs  routing  of  all  the  nets  belong  to  its  region.  A  third  approach  uses  a  hybrid 
approach  where  part  of  the  algorithm  is  performed  using  net  decomposition,  and  part  of  the  algorithm  is 
performed  using  an  area  decomposition.  We  have  experimentally  evaluated  the  performance  of  all  three 
parallel  algorithms.  The  hybrid  algorithm  has  been  found  to  obtain  the  best  speedups  (about  6.5  on  8 
processors  on  a  SUN  SparcCenter  1000)  and  has  minimized  the  quality  degradation  of  the  routing  to  less 
than  2%  of  the  serial  algorithm  for  various  benchmark  circuits. 

Accomplishment  3 

We  have  developed  three  parallel  layout  verification  algorithms  for  mask  layouts.  The  first  algorithm 
called  ProperDRCl  uses  data  parallelism  to  distribute  rectangles  of  a  mask  layout  to  different  processors. 
The  second  algorithm  called  ProperDRC2  uses  task  parallelism  to  assign  different  design  rules  to  different 
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processors.  A  third  algorithm  combines  task  and  data  parallelism  to  combine  the  benifits  of  both  approaches. 
Speedups  of  about  100  have  been  obtained  on  128  processors  of  a  CM-5. 

Accomplishment  ^ 

Our  parallel  algorithm  for  logic  synthesis  is  based  on  the  MIS/SIS  approaches  developed  by  researchers 
at  Berkeley.  We  have  developed  parallel  algorithms  for  several  key  transformations  in  combinational  logic 
synthesis,  specifically,  kernel  extraction,  cube  extraction,  node  resubstitution,  and  node  simplification.  We 
have  obtained  excellent  parallel  implementations  of  the  parallel  MIS  algorithm.  Speedups  of  6.0  have  been 
obtained  on  8  processors  in  the  parallel  implementations  on  a  Sun  SparcCenter  1000. 

In  addition  to  the  above  algorithms,  three  different  algorithms  for  the  algebraic  factorization  procedures 
in  combinational  logic  synthesis  within  the  MIS  system  have  been  developed.  The  first  algorithm  uses  circuit 
replication  and  uses  a  divide-and-conquer  strategy  to  follow  the  same  search  path  as  the  sequential  algorithm. 
A  second  algorithm  uses  totally  independent  factorization  on  different  circuit  partitions,  A  third  algorithm 
uses  a  novel  L-shaped  partitioning  strategy  which  allows  for  some  interaction  among  the  kernels  in  various 
partitions.  All  the  algorithms  have  been  implemented  on  a  SUN  SPARCSERVER  1000.  For  a  large  circuit 
having  14,000  literals,  the  third  algorithm  runs  11.5  times  faster  than  the  sequential  algorithm  with  less  than 
0.2%  degradation  in  the  quality  of  the  results. 

Accomplishment  5 

In  the  area  of  sequential  logic  synthesis,  two  novel  parallel  algorithms  for  state-assignment  of  finite  state 
machines  have  been  developed:  one  based  on  MUSTANG  called  ProperSTATE,  another  based  on  JEDI 
called  ProperJEDI,  which  are  part  of  the  SIS  synthesis  tool.  Both  algorithms  have  been  developed  in  a 
data-partitioned  manner  so  that  they  are  both  processor  and  memory  scalable  for  very  large  finite  state 
machines,  namely  they  can  perform  state  assigment  on  examples  that  do  not  run  on  a  single  processor,  but 
run  across  the  memory  of  a  parallel  machine.  Speedups  of  about  7  on  an  8-processor  SPARCSERVER-1000 
and  about  30  on  a  64-processor  CM-5  have  been  measured  for  both  algorithms.  The  quality  of  the  solutions 
of  the  parallel  algorithm  are  within  1%  of  the  sequential  algorithms.  In  the  area  of  FPGA  synthesis,  we 
have  developed  a  parallel  algorithm  for  technology  mapping  of  look-up  table  based  FPGAs.  Speedups  of 
6.5  on  8  processors  have  been  obtained  on  various  benchmark  circuits  on  a  SUN  SPARCCENTER  1000 
multiprocessor.  This  algorithm  will  enable  the  state  assignment  of  finite  state  machines  having  200  latches 
and  beyond  which  will  push  the  current  state  of  the  art  from  about  20  latch  circuits. 

Accomplishment  6 

Three  new  parallel  test  generation  algorithms  for  sequential  circuit  test  generation  based  on  a  genetic 
algorithm  called  GATEST  have  been  developed.  The  first  algorithm  called  ProperGATESTl  performs 
parallelization  using  data  decomposition  by  partitioning  the  populations  in  the  genetic  algorithm  across  the 
processors,  and  obtains  speedups  of  about  6.8  on  8  processors  of  a  SPARCSERVER-1000.  These  results 
have  been  reported  without  any  degradation  in  the  quality  of  the  solutions  from  the  sequential  algorithm. 
The  second  algorithm,  ProperGATEST2,  uses  a  parallel  search  strategy  where  each  processor  executes 
the  sequential  genetic  algorithm  with  a  different  seed,  and  uses  migration  to  share  information  between 
processors.  Speedups  of  about  5.3  on  8  processors  have  been  obtained  on  a  SUN  SPARCSERVER  1000 
with  qualities  that  are  comparable  to  the  serial  algorithm.  The  third  algorithm,  ProperGATEST3,  is  a 
subpopulation  based  version  of  ProperGATEST2,  where  subpopulations  are  distributed  across  processors 
and  information  is  migrated  from  one  processor  to  another.  Speedups  of  about  7.2  on  8  processors  have  been 
obtained  on  a  SUN  SPARCSERVER  1000  with  qualities  that  are  comparable  to  the  serial  algorithm.  These 
A  algorithms  will  enable  the  generation  of  tests  for  very  complex  circuits  of  the  future. 

Accomplishment  7 

We  have  developed  various  parallel  algorithms  for  fault  simulation.  While  previous  approaches  to  parallel 
fault  simulation  have  used  circuit  parallelism  or  fault  parallelism  approaches,  in  the  past  year,  we  have 
developed  scalable  parallel  test-set  partitioned  algorithms  for  fault  simulation  in  a  series  of  implementations 
called  SPITFIRE.  The  basic  idea  in  this  approach  is  to  partition  the  test  sets  among  the  processors  so  that 
each  processor  performs  fault  simulation  on  its  own  set  of  inputs  but  on  the  entire  circuit  and  on  the  entire 
list  of  faults.  While  this  approach  is  easily  applicable  to  combinational  circuits,  the  approach  is  not  directly 
applicable  in  sequential  circuits  where  there  are  state  dependencies  across  time  frames,  i.e.  the  state  of  the 
circuit  at  the  present  time  frame  may  depend  on  the  state  of  the  circuit  in  all  previous  time  frames.  We 
proposed  a  technique  of  allowing  for  some  overlaps  of  test  vectors  among  the  different  test  partitions  in  the 
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various  processors.  By  experimenting  with  the  degree  of  overlap,  it  was  possible  to  control  the  quality  of 
the  results  (fault  coverage)  and  the  speedups  obtained.  We  developed  six  variants  of  this  algorithm  and 
one  of  those  actually  combined  fault  parallelism  and  test  set  parallelism  very  effectively.  The  most  efficient 
algorithms  produced  average  speedups  of  about  6.5  on  8  processors  of  a  SPARCServer  1000  multiprocessor 
on  severed  large  sequential  benchmarks. 

Accomplishment  8 

We  have  developed  an  efficient  compiled  event-driven  simulation  algorithm  for  VHDL  simulations.  Two 
approaches  to  parallelization  on  shared  memory  multiprocessors  were  developed.  The  first  one  was  based  on 
fine  grained  task  scheduling  where  each  task  corresponded  to  a  straightline  sequence  of  VHDL  code  without 
any  wait  statements.  The  second  appproach  was  based  on  a  coarse  grained  partitioning  of  the  program 
by  identifying  fan-in  cones  of  the  circuits  being  simulated.  Both  approaches  were  evaluated  on  a  set  of 
benchmark  examples.  Speedups  of  about  3  to  4  were  meaasured  on  8  processors  of  a  SUN  SPARCServer 
1000  multiprocessor. 

Accomplishment  9 

We  have  developed  several  parallel  algorithms  for  high-level  synthesis  of  signal  flow  graphs.  One  set  of 
parallel  algorithms  have  been  developed  for  the  force  directed  scheduling  problem  by  partitioning  the  nodes 
and  time  steps  across  the  processors.  Speedups  of  about  4  have  been  measured  on  8  processors  of  an  SGI 
Origin,  and  about  8  on  16  processors  of  an  IBM  SP2.  Another  set  of  parallel  algorithms  have  been  based 
on  a  multiple  Markov  Chain  model  of  parallel  simulated  annealing  which  simultaneously  tries  to  perform 
scheduling,  allocation  and  floorplanning.  Speedups  of  about  14  have  been  reported  on  several  benchmark 
circuits  on  16  processor  of  an  IBM  SP2  multiprocessor. 

Accomplishment  10 

We  have  developed  several  parallel  algorithms  for  power  estimation  of  combinational  and  sequential 
circuits  based  on  exhaustive  simulation  and  Monte  Carlo  methods.  Speedups  of  about  14  on  16  processors 
were  obtained  for  combinational  circuits  and  about  10  on  16  processors  for  sequential  circuits  on  an  IBM 
SP2  multiprocessor. 

Accomplishment  11 

Finally,  we  have  developed  parallel  algorithms  for  3-dimensional  circuit  extraction  of  interconnect  struc¬ 
tures  based  on  the  Boundary  Element  Method.  We  have  developed  parallel  algorithms  for  constructing 
the  matrix  of  the  BEM  method  and  for  solving  the  equations  based  on  both  direct  and  indirect  methods. 
Speedups  of  about  12  on  16  processors  of  an  IBM  SP2  multiprocessor  have  been  reported. 

TECHNOLOGY  TRANSFER 

The  ProperCAD  library  that  was  originally  developed  at  the  University  of  Illinois  has  been  licensed  to  a 
commercial  company.  Sierra  Vista  Research  (SVR).  In  cooperation  with  the  University  of  Illinois,  SVR  has 
provided  improvements  to  the  interfaces  to  the  ProperCAD  library,  and  some  documentation  on  how  to  use 
the  library.  The  library  is  now  commercially  available  from  Sierra  Vista  Research. 

The  parallel  design  rule  checking  algorithms  developed  eis  part  of  the  project  has  been  transferred  to 
Cadence  Design  Systems  .  They  have  employed  the  concepts  of  parallel  layout  verification  within  their  DRC 
tool  called  VAMPIRE.  It  is  now  a  commercial  product. 

The  parallel  placement  algorithms  developed  as  part  of  this  project  have  been  transferred  to  LSI  Logic 
Corporation.  They  have  used  the  concepts  of  parallel  simulated  annealing  in  placement  within  their  place¬ 
ment  tool  called  CMDE-PLACE. 

The  parallel  logic  synthesis  algorithms  developed  as  part  of  this  contract  have  been  transferred  to  Ambit 
Design  Systems  and  have  been  incorporated  into  their  commercial  package  called  BuildGates. 


C.  LIST  OF  ALL  PUBLICATIONS  AND  TECHNICAL  REPORTS 

JOURNAL  PAPERS 

•  P.  Banerjee,  J.  Chandy,  M.  Gupta,  J.  G.  Holm,  A.  Lain,  D.  J.  Palermo,  S.  Ramaswamy  and  E.  Su,” 
“The  PARADIGM  Compiler  for  Distributed  Memory  Multicomputers,”  IEEE  Computer,  Vol.  28,  No. 
10,  Oct.  1995,  pp.  37-47. 


5 


•  S.  Ramaswamy  and  P.  Banerjee,  “Simultaneous  Allocation  and  Scheduling  Using  Convex  Programming 
Techniques,”  Parallel  Processing  Letters  (Special  Issue  on  Partitioning  and  Scheduling),  Dec.  1995. 

•  A.  Roy-Chowdhury  and  P.  Banerjee,  “A  New  Error  Analysis  Based  Method  for  Tolerance  Computation 
for  Algorithm-Ba.sed  Checks,”  IEEE  Trans.  Computers,  Vol.  45,  No.  2,  Feb.  1996,  pp.  238-243. 

•  V.  S.  S.  Nair,  J.  A.  Abraham,  P,  Banerjee,  “Efficient  Techniques  for  the  Analysis  of  Algorithm-Based 
Fault  Tolerance  (ABFT)  Schemes”  IEEE  Trans.  Computers,  Vol.  45,  No.  4,  Apr.  1996,  pp.  499-502. 

•  A.  Roy  Chowdhury,  N.  Bellas,  and  P.  Banerjee,  “Algorithm-Based  Error  Detection  Schemes  for  Iterative 
Solution  of  Partial  Differential  Equations,”  IEEE  Trans.  Computers,  Vol.  45,  No.  4,  Apr  1996,  pp 
394-407. 

•  A.  Roy-Chowdhury  and  P.  Banerjee,  “Algorithm-based  Fault  Location  and  Recovery  for  Matrix  Com¬ 
putations  on  Multiprocessor  Systems,”  IEEE  Trans.  Computers,  Vol.  45,  no.  11,  Nov.  1996,  pp. 
1239-1247. 

•  E.  Rudnick,  V.  Chickermane,  P.  Banerjee,  J.  H.  Patel,  “Sequential  Circuit  Testability  Enhancement 
Using  a  Non-scan  Approach,”  IEEE  Transactions  on  VLSI  Systems,  to  appear,  1996. 

•  K.  McPherson  and  P.  Banerjee,  “Parallel  Algorithms  for  VLSI  Layout  Verification,”  Journal  of  Parallel 
and  Distributed  Computing,  Vol.  36,  No.  2,  August  1996,  pp.  156-172. 

•  D.  Palermo,  E.  W.  Hodges  and  P.  Banerjee,  “Dynamic  Data  Partitioning  for  Distributed  Memory  Mul¬ 
ticomputers,”  Journal  of  Parallel  and  Distributed  Computing  (Special  Issue  on  Compilation  Techniques 
for  Distributed  Memory  Systems)  November  1,  1996,  Vol.  38,  no.  2,  pp.  158-175. 

•  S.  Ramaswamy,  B.  Simons  and  P.  Banerjee,  “Optimizations  for  Efficient  Array  Redistribution  on 
Distributed  Memory  Multicomputers,”  Journal  of  Parallel  and  Distributed  Computing  (Special  Issue 
on  Compilation  Techniques  for  Distributed  Memory  Systems)  November  1,  1996,  Vol.  38,  no.  2,  pp. 
217-228. 

•  S.  Ramaswamy,  S.  Sapatnekar,  and  P.  Banerjee,  “A  Framework  for  Exploiting  Data  and  Functional 
Parallelism  on  Distributed  Memory  Multicomputers,”  IEEE  Trans.  Parallel  and  Distributed  Systems, 
Vol.  8,  No.  11,  pp.  1098-1116,  November  1997. 

•  B.  Ramkumar  and  P.  Banerjee,  “ProperTEST:  A  Portable  Parallel  Test  Generator  for  Sequential 
Circuits,”  IEEE  Trans.  Computer-Aided  Design,  Vol.  16,  No.  5,  pp.  555-569,  May  1997. 

•  G.  Hcisteer  and  P.  Banerjee,  “A  Parallel  Algorithm  for  State  Assignment  of  Finite  State  Machines,” 
IEEE  Transactions  on  Computers,  Vol.  47,  No.  2,  February  1998,  pp.  242-246. 

•  V.  Krishnaswamy,  R.  Gupta  and  P.  Banerjee,  “Implications  of  VHDL  Timing  Models  on  Simulation 
and  Software  Synthesis,”  Journal  of  Systems  Architecture,  North-Holland  Elsevier  Publishers,  Vol.  44, 
1997,  pp.  23-36. 

•  G.  Heisteer  and  P.  Banerjee,  “Simulated  Annealing  Based  Parallel  State  Assignment  for  Finite  State 
Machines,”  Journal  of  Parallel  and  Dist.  Computing,  Vol.  43,  no.  1,  May  25,  1997,  pp.  21-35. 

•  J.  A.  Chandy,  S.  Kim,  B.  Ramkumar,  S.  Parkes,  and  P.  Banerjee  “An  Evaluation  of  Parallel  Simulated 
Annealing  Strategies  with  Applications  to  Standard  Cell  Placement”,  IEEE  Trans,  on  Computer  Aided 
Design,  Vol.  16,  No.  4,  pp.  398-410,  April  1997. 

•  M.  Kandemir,  A.  Choudhary,  N.  Shenoy,  P.  Banerjee,  J.  Ramanujam,  “A  Linear  Algebra  Framework 
for  Automatic  Determination  of  Optimal  Data  Layouts,”  To  appear  in  IEEE  Transactions  on  Parallel 
and  Distributed  Systems,  1998. 

•  G.  Hasteer,  A.  Mathur,  and  P.  Banerjee,  “Efficient  Equivalence  Checking  of  Multi-Phase  Designs 
Using  Phase  Abstraction  and  Retiming.,”  To  appear  in  ACM  Transactions  on  Design  Automation  of 
Electronic  Systems  (TOD AES)  Special  Issue  on  High  Level  Design,  Validation  and  Testing. 


6 


•  J.  Chandy  and  P.  Banerjee,  “A  Parallel  Circuit  Partitioned  Algorithm  for  Timing-Driven  Standard 
Cell  Placement,”  to  appear  in  Journal  of  Parallel  and  Distributed  Computing. 

CONFERENCE  PROCEEDINGS 

•  S.  Parkes,  J.  Chandy,  and  P.  Banerjee,  ”A  Library-Based  Approach  to  Portable,  Parallel,  Object- 
Oriented  Programming:  Interface,  Implementation,  and  Application,”  Proc.  ACM  Supercomputing  94 
Conf.,  Washington,  DC,  Nov.  1994,  pp.  69-78. 

•  P.  Banerjee,  J.  Chandy,  M.  Gupta,  J.  G.  Holm,  A.  Lain,  D.  J.  Palermo,  S.  Ramaswamy  and  E.  Su, 
’’The  PARADIGM  Compiler  for  Distributed  Memory  Message-Passing  Multicomputers,”  Proc.  First 
Int.  Workshop  on  Parallel  Processing,  Bangalore,  INDIA,  Dec.  1994,  pp.  322-330. 

•  P.  Banerjee,  “A  Survey  of  Current  and  Future  Research  Directions  in  Parallel  CAD”,  Proc.  Parallel 
and  Distributed  LSI-CAD  Workshop,  Tokyo,  JAPAN,  Dec.  1994,  pp.  57-66. 

•  S.  Ramaswamy  and  P.  Banerjee,  “Automatic  Generation  of  Efficient  Array  Redistribution  Routines  for 
Distributed  Memory  Multicomputers,”  Proc.  5th  Symp.  Frontiers  of  Massively  Parallel  Computation’, 
McLean,  VA,  Feb.  1995,  pp.  342-349. 

•  A.  Lain  and  P.  Banerjee,  “Exploiting  Spatial  Regularity  with  Irregular  Iterative  Applications,”  Proc. 
8th  Int.  Parallel  Processing  Symp  (IPPS-95),  Santa  Barbara,  CA,  Apr.  1995. 

•  K.  De,  J.  Chandy,  S.  Roy,  S.  Parkes  and  P.  Banerjee,  “Parallel  Algorithms  for  Logic  Synthesis  Based 
on  MIS”  Proc.  8th  Int.  Parallel  Processing  Symp  (IPPS-95),  Santa  Barbara,  CA,  Apr.  1995. 

•  M.  Peercy  and  P.  Banerjee,  ’’Software  Schemes  of  Reconfiguration  and  Recovery  in  Distributed  Memory 
Multicomputers  Using  the  Actor  Model”  Proc.  Fault  Tolerant  Computing  Symp.  (FTCS-25),  Jun. 

1995,  Pasadena,  CA. 

•  D.  Palermo  and  P.  Banerjee,  “Automatic  Selection  of  Dynamic  Data  Partitioning  Schemes  for  Distributed- 
Memory  Multicomputers,”  Proc.  8th  Int.  Workshop  on  Languages  and  Compilers  for  Parallel  Com¬ 
puting  (LCPC95,  Aug.  1995,  Columbus,  OH. 

•  S.  Parkes,  P.  Banerjee  and  J.  H.  Patel,  “A  Parallel  Algorithm  for  Fault  Simulation  Based  on  PROOFS,” 
Int.  Conf  Computer  Design  (ICCD  95),  Austin,  TX,  Oct.  1995. 

•  J.  Chandy  and  P.  Banerjee,  “Parallel  Simulated  Annealing  Strategies  for  VLSI  Cell  Placement”,  9th 
Int.  Conf  VLSI  Design,  New  Delhi,  India,  Jan.  1996. 

•  S.  Ramaswamy,  E.  W.  Hodges,  and  P.  Banerjee,  “Compiling  MATLAB  Programs  to  SCALAPACK: 
Exploiting  Task  and  Data  Parallelism,”  Proc.  Int.  Parallel  Processing  Symp.  (IPPS-96),  Honolulu, 
Hawaii,  Apr.  1996,  pp.  613-620. 

•  Z.  Xing  and  P.  Banerjee,  “A  Parallel  Hierarchical  Algorithm  for  Module  Placement  Based  on  Sparse 
Linear  Equations”,  Proc.  IEEE  Int.  Symp.  Circuits  and  Systems  (ISCAS-96),  Atlanta,  GA,  May 

1996,  Vol.  IV,  pp.  691-694. 

•  V.  Krishnaswamy  and  P.  Banerjee,  “Actor-beised  Parallel  VHDL  Simulation  Using  Time  Warp,”  Proc. 
1996  Int.  Workshop  on  Parallel  and  Distributed  Simulation  (PADS-96),  Philadelphia,  PA,  May,  1996. 

•  A.  Lain  and  P.  Banerjee,  “Compiler  Support  for  Hybrid  Irregular  Accesses  on  Multicomputers”  Proc. 
ACM  Int.  Conf.  Supercomputing  (ICS-96),  Philadelphia,  PA,  May,  1996,  pp.  1-9. 

•  A.  Roy-Chowdhury  and  P.  Banerjee,  “Compiler- Assisted  Generation  of  Error- Detecting  Parallel  Pro¬ 
grams,”  Proc.  26th  Int.  Symp.  on  Fault- Tolerant  Computing  (FTCS-26),  Sendai,  JAPAN,  Jun.  1996. 

•  D.  Palermo,  E.  Su,  E.  W.  Hodges,  and  P.  Banerjee,  “Compiler  Support  for  Privatization  for  Distributed 
Memory  Machines,”  Proc.  Int.  Conf.  Parallel  Processing  (ICPP-96),  Bloomingdale,  IL,  Aug.  1996. 


7 


•  G.  Hasteer  and  P.  Banerjee,  “A  Parallel  Algorithm  for  State  Assignment  in  Finite  State  Machines,” 
Proc.  Int.  Conf.  Parallel  Processing  (ICPP-96),  Bloomingdale,  IL,  Aug.  1996. 

•  V.  Boppana,  P.  Saxena,  P.  Banerjee,  W.  K.  F\ichs,  and  C.  L.  Liu,  “A  Parallel  Algorithm  for  the  Tech¬ 
nology  Mapping  of  LUT-based  FPGAs,”  Proc.  EUROPAR-96  Workshop  on  Parallel  Nonnumerical 
Algorithms,  Lyon,  FRANCE,  Aug.  1996. 

•  J.  A.  Chandy,  S.  Parkes,  and  P.  Banerjee,  “Distributed  Object  Oriented  Data  Structures  and  Algo¬ 
rithms  for  VLSI  CAD,”  Proc.  Workshop  on  Parallel  Algorithms  for  Irregularly  Structured  Problems, 
Santa  Barbara,  CA,  Aug.  1996. 

•  D.  J.  Palermo,  E.  W.  Hodges,  IV,  and  P.  Banerjee,  “Interprocedural  Array  Redistribution  Data-Flow 
Analysis”,  Languages  and  Compilers  for  Parallel  Computing,  Santa  Clara,  CA,  Aug.  1996. 

•  P.  Prabhakaran  and  P.  Banerjee,  “Parallel  Algorithms  for  Force- Directed  Scheduling  of  Flattened  and 
Hierarchical  Signal  Flow  Graphs,”  Proc.  Int.  Conf.  Computer  Design  (ICCD-96),  Austin,  TX,  Oct. 
1996. 

•  D.  Palermo,  E.  W.  Hodges,  and  P.  Banerjee,  “Techniques  for  Selecting  and  Analyzing  Data  Distribu¬ 
tions,”  Workshop  on  Challenges  in  Compiling  for  Scalable  Parallel  Systems,  New  Orleans,  LA,  Oct. 
1996. 

•  K.  McPherson  and  P.  Banerjee,  “Integrating  Task  and  Data  Parallelism  in  an  Irregular  Application: 
A  Case  Study”,  Proc.  Symp.  on  Parallel  and  Distributed  Processing,  New  Orleans,  LA,  Oct.  1996,  pp. 
208-213. 

•  V.  Krishnaswamy,  R.  Gupta,  P.  Banerjee,  “A  Procedure  for  Software  Synthesis  from  VHDL  Models,” 
Proc.  of  Asia-Pacific  Design  Automation  Conf.,  Tokyo,  JAPAN,  Jan.  1997. 

•  G.  Hasteer  and  P.  Banerjee,  “Simulated  Annealing  Based  Parallel  State  Assignment  for  Finite  State 
Machines,”  Proc.  Int.  Conf.  VLSI  Design  (VLSI-97),  Hyderabad,  INDIA,  Jan.  1997. 

•  D.  Krishnaswamy,  M.  S.  Hsiao,  V.  Saxena,  E.  M.  Rudnick,  P.  Banerjee,  and  J.  Patel,” Parallel  Genetic 
Algorithms  for  Simulation-based  Sequential  Circuit  Test  Generation,”  Proc.  Int.  Conf.  VLSI  Design 
(VLSI-97),,  Hyderabad,  INDIA,  Jan.  1997. 

•  J.  G.  Holm,  S.  Parkes,  and  P.  Banerjee,  “Performance  Evaluation  of  a  C-l— I-  Library  Based  Multi¬ 
threaded  System,”  Hawaii  Int.  Conf.  on  System  Sciences,  Maui,  HA,  Jan.  1997. 

•  S.  Roy  and  P.  Banerjee,  “A  Comparison  of  Parallel  Approaches  for  Algebraic  Factorization  in  Logic 
Synthesis”,  Proc.  Int.  Parallel  Processing  Symposium  (IPPS97),  Geneva,  Switzerland,  April  1997. 

•  Z.  Xing,  J.  Chandy,  and  P.  Banerjee,  “Parallel  Global  Routing  for  Standard  Cells,”  Proc.  Int.  Parallel 
Processing  Symposium  (IPPS-97),  Geneva,  Switzerland,  April  1997. 

•  D.  Krishnaswamy,  E.  Rudnick,  and  P.  Banerjee,  “SPITFIRE:  Scalable  Parallel  Algorithms  for  Test  Set 
Partitioned  Fault  Simulation,”  Proc.  IEEE  VLSI  Test  Symp.,  Monterey,  CA,  Apr.  1997. 

•  S.  Roy  and  P.  Banerjee,  “An  L-Shaped  Partitioning- Based  Algebraic  Factorization  Algorithm  Proc. 
Int.  Symp.  on  Circuits  and  Systems  (ISCAS-97),  Hong  Kong,  Jun.  1997. 

•  D.  Krishnaswamy,  P.  Banerjee,  E.  Rudnick  and  J.  Patel,  “Asynchronous  Parallel  Algorithms  for  Test 
Set  Partitioned  Parallel  Fault  Simulation,”  Proc.  Workshop  on  Parallel  and  Distributed  Simulation 
(PADS-97),  Jun.  1997. 

•  G.  Hasteer,  A.  Mathur,  P.  Banerjee,  “An  Efficient  Assertion  Checker  for  Combinational  Properties,” 
Proc.  Design  Automation  Conference  (DAC97),  Jun.  1997. 


8 


•  J.  G.  Holm,  J.  Chandy,  G.  Hasteer,  V.  Krishnaswamy,  S.  Parkes,  S.  Roy,  and  P.  Banerjee,  ’’Performance 
Evaluation  of  Message-Driven  Parallel  VLSI  CAD  Applications  on  General-Purpose  Multiprocessors,” 
Proc.  International  Conference  on  Supercomputing  (ICS-97),  Vienna,  AUSTRIA,  July  1997. 

•  D.  Krishnaswamy  and  P.  Banerjee,  “Exploiting  Task  and  Data  Parallelism  in  Parallel  Hough  and 
Radon  Transforms,”  Proc.  Int.  Conference  on  Parallel  Processing  (ICPP-97),  Bloomingdale,  IL,  Aug. 
1997. 

•  V.  Krishnaswamy,  G.  Hasteer,  and  P.  Banerjee,  “Load  Balancing  and  Workload  Minimization  of  Over¬ 
lapping  Parallel  Tasks,”  Proc.  Int.  Conference  on  Parallel  Processing  (ICPP-97),  Bloomingdale,  IL, 
Aug.  1997. 

•  J.  A.  Chandy  and  P.  Banerjee,  “A  Parallel  Circuit-Partitioned  Algorithm  for  Timing-driven  Standard 
Cell  Placement,”  Proc.  Int.  Conference  on  Computer- Design  (ICCD-97),  October  1997,  Austin,  TX. 

•  G.  Hasteer,  A.  Mathur,  and  P.  Banerjee,  “A  F^^unework  for  Equivalence  Checking  of  Multi-Phase 
FSMs,”  Proc.  International  High-Level  Design  Validation  and  Test  Workshop,  Oakland,  CA,  Nov. 

1997. 

•  P.  Prabhakaran  and  P.  Banerjee,  “Simultaneous  Scheduling,  Binding  and  Floorplanning  in  High-Level 
Synthesis,”  Proc.  11th  International  Conference  on  VLSI  Design  (VLSI  Design’98),  Chennai,  India, 
Jan.  1998. 

•  S.  Roy,  P.  Banerjee  and  M.  Sarrafzcideh,  “Partitioning  Sequenti^ll  Circuits  for  Low  Power,”  Proc.  11th 
International  Conference  on  VLSI  Design  (VLSI  Design’98),  Chennai,  India,  Jan.  1998. 

•  S.  Roy,  A.  Harm,  and  P.  Banerjee,  “PowerShake:  A  Low  Power  Driven  Clustering  and  Factoring 
Methodology  for  Boolean  Expressions,”  Proc.  Design,  Automation  and  Test  in  Europe  Conference 
(DATE  98),  Paris,  France,  Feb.  1998. 

•  D.  Chakrabarti,  A.  Lain,  and  P.  Banerjee,  “Evaluation  of  Compiler  and  Runtime  Library  Approaches 
for  Supporting  Parallel  Regular  Applications,”  Proc.  Int.  Parallel  Processing  Symp.  (IPPS-98),  Apr. 

1998,  Orlando,  FL. 

•  M.  Kandemir,  P.  Banerjee,  A.  Choudhary,  J.  Ramanujam,  N.  Shenoy,  “A  Generalized  Framework  for 
Global  Communication  Optimization,”  Proc.  Int.  Parallel  Processing  Symp.  (IPPS-98),  Apr.  1998, 
Orlando,  FL. 

•  Z.  Xing  and  P.  Bcinerjee,  “A  Parallel  Algorithm  for  Zero  Skew  Clock  Tree  Routing,”  Proc.  Int.  Symp. 
Physical  Design  (ISPD98),  Apr.  1998,  Monterey,  CA. 

•  S.  Roy  and  P.  Banerjee,  “Resynthesis  of  Sequential  Circuits  for  Low  Power,”  Proc.  International 
Conference  on  Circuits  and  Systems  (ISCAS-98),  Monterey,  CA,  May  1998. 

•  P.  Prabhakaran  and  P.  Banerjee,  “Parallel  Algorithms  for  Scheduling,  Binding,  and  Floorplanning  in 
High  -Level  Synthesis,”  Proc.  International  Conference  on  Circuits  and  Systems  (ISCAS-98),  Mon¬ 
terey,  CA,  May  1998. 

•  G.  Hasteer,  A.  Mathur,  and  P.  Banerjee,  “An  Implicit  Algorithm  for  Finding  Steady  States  and  its 
Application  to  FSM  Verification,”  Proc.  Design  Automation  Conference  (DAC-98),  Jun.  1998,  San 
Francisco,  CA. 

•  V.  Kim  and  P.  Banerjee,  “Parallel  Algorithms  for  Power  Estimation,”  Proc.  Design  Automation 
Conference  (DAC-98),  Jun.  1998,  San  Francisco,  CA. 

•  M.  Wang,  M.  Sarrafzadeh,  and  P.  Banerjee,  “Placement  with  Incomplete  Data,”  Proc.  Design  Au¬ 
tomation  Conference  (DAC-98),  Jun.  1998,  San  FVancisco,  CA. 

•  V.  Krishnaswamy  and  P.  Banerjee,  “Parallel  Compiled  Event  Driven  VHDL  Simulation,”  Proc.  Int. 
Conf.  Supercomputing  (ICS-98),  Melbourne,  AUSTRALIA,  July  1998. 


9 


•  D.  R.  Chakrabarti,  N.  Shenoy,  A.  Choudhary,  and  P.  Banerjee,  “An  Efficient  Uniform  Run-time  Scheme 
for  Mixed  Regular-Irregular  Applications,”  Proc.  Int.  Conf.  Supercomputing  (ICS-98),  Melbourne, 
AUSTRALIA,  July  1998. 

•  M.  Kandemir,  A.  Choudhary,  N.  Shenoy,  J.  Ramanujam,  and  P.  Banerjee,  “A  Hyperplane  Based 
Approach  for  Optimizing  Spatial  Locality  in  Loop  Nests,”  Proc.  Int.  Conf.  Supercomputing  (ICS-98), 
Melbourne,  AUSTRALIA,  July  1998. 

•  M.  Kandemir,  J.  Ramanujam,  A.  Choudhary,  P.  Banerjee,  “An  iteration  space  transformation  algo¬ 
rithm  based  on  an  explicit  data  layout  representation  for  optimizing  locality,”  Proc.  Workshop  on 
Languages  and  Compilers  for  Parallel  Computing  (LCPC-98),  Chapel  Hill,  NC,  Aug.  1998. 

•  M.  Kandemir,  N.  Shenoy,  P.  Banerjee,  J.  Ramanujam,  and  A.  Choudhary,  “Minimizing  Data  and 
Synchronization  Costs  in  One-Way  Communication,”  Proc.  Int.  Conf.  Parallel  Processing  (ICPP98), 
Minneapolis,  MN,  Aug.  1998. 

•  Z.  Xing  and  P.  Banerjee,  “A  Parallel  Algorithm  for  Timing-Driven  Global  Routing  for  Standard  Cells,” 
proc.  Int.  Conf  Parallel  Processing  (ICPP98),  Minneapolis,  MN,  Aug.  1998. 

•  M.  Kandemir,  A.  Choudhary,  J.  Ramanujam,  N.  Shenoy,  and  P.  Banerjee,  “Enhancing  Spatial  Locality 
using  Data  Layout  Optimizations,”  Proc.  European  Conference  on  Parallel  Processing  (Euro-Par’98), 
Southampton,  ENGLAND,  Sep.  1998. 

•  A.  Mishra  and  P.  Banerjee,  “A  Fault  Tolerant  Multi-Grid  Algorithm,”  Proc.  Parallel  and  Distributed 
Computing  Systems  (PDCS98),  Chicago,  Sep.  1998. 

•  S.  Roy,  A.  Harms  and  P.  Banerjee,  “A  Low  Power  Logic  Optimization  Methodology  Based  on  a  Fast 
Power  Driven  Mapping,”  Proc.  Int.  Conf.  Computer  Design  (ICCD-98),  Austin,  TX,  Oct.  1998. 

•  M.  Kandemir,  A.  Choudhary,  J.  Ramanujam,  N.  Shenoy,  and  P.  Banerjee,  “A  Matrix-Based  Approach 
to  the  Global  Locality  Optimization  Problem,”  Proc.  Parallel  Architectures  and  Compilation  Tecniques 
(PACT-98),  Paris,  FRANCE,  Oct.  1998. 

•  S.  Roy  and  P.  Banerjee,  “Power  Drive:  A  fast,  canonical  POWER  estimator  for  DRIVing  synthEsis,” 
Proc.  1998  International  Conference  on  Computer-Aided  Design  (ICCAD-98),  San  Jose,  CA,  Nov. 
1998. 

•  G.  Hasteer,  A.  Mathur,  and  P.  Banerjee,  “Efficient  Equivalence  Checking  of  Multi-Phase  Designs  Using 
Retiming”,  Proc.  1998  International  Conference  on  Computer-Aided  Design  (ICCAD-98),  San  Jose, 
CA,  Nov.  1998. 

•  D.  Chakrabarti,  P.  Joisha,  J.  Chandy,  D.  Krishnaswamy,  V.  Krishnaswamy,  and  P.  Banerjee,  “WADE: 
A  Web-Based  Automated  Parallel  CAD  Environment,”  Proc.  International  Conference  on  High  Per¬ 
formance  Computing  (HiPC'98),  Chennai,  India,  Dec.  1998. 

•  M.  Kandemir,  A.  Choudhary,  J.  Ramanujam,  and  P.  Banerjee,  ’’Improving  locality  using  loop  and  data 
transformations  in  an  integrated  framework”  Proc.  31st  Int.  Symp.  on  Micro- Architecture  (MICRO- 
31),  Dallas,  Texas,  Dec.  1998. 

•  P.  Prabhakaran,  J.  Crenshaw,  P.  Banerjee,  and  M.  Sarrafzadeh,  “Simultaneous  Scheduling,  Binding 
and  Floorplanning  for  Interconnect  Power  Optimization,”  Proc.  1999  VLSI  Design  Conference,  Goa, 
India,  Jan.  1999. 


10 


D.  LIST  OF  ALL  PARTICIPATING  SCIENTIFIC  PERSONNEL 

D.l.  University  of  Illinois 

•  PROF.  PRITHVIRAJ  BANERJEE,  Professor,  Electrical  and  Computer  Engineering,  Coordinated 
Science  Lab,  University  of  Illinois,  Urbana. 

•  PROF.  JANAK  PATEL,  Professor,  Electrical  and  Computer  Engineering,  Coordinated  Science  Lab, 
University  of  Illinois,  Urbana. 

•  DR.  BALKRISHNA  RAMKUMAR,  Postdoctoral  Researcher,  Coordinated  Science  Lab,  University  of 
Illinois,  Urbana. 

•  PRADEEP  PRABHAKARAN,  Ph.D.  Student,  Electrical  and  Computer  Engineering,  University  of 
Illinois  (Advisor:  P.  Banerjee),  graduated  9/98,  presently  working  for  Compaq- Digital 

•  SUMIT  ROY,  Ph.D.  Student,  Electrical  and  Computer  Engineering,  University  of  Illinois  (Advisor:  P. 
Banerjee),  graduated  8/98,  presently  working  for  Ambit  Design  Systems. 

•  DANIEL  PALERMO,  Ph.D.  Student,  Electrical  and  Computer  Engineering,  University  of  Illinois  (Ad¬ 
visor:  P.  Banerjee),  graduated  5/96,  presently  working  for  Hewlett-Packard  Convex. 

•  GAGAN  HASTEER,  Ph.D.  Student,  Computer  Science,  University  of  Illinois  (Advisor:  P.  Banerjee), 
graduated  12/97,  presently  working  for  Ambit  Design  Systems. 

•  ERNESTO  SU,  Ph.D.  student,  Electrical  and  Computer  Engineering,  University  of  Illinois  (Advisor: 
P.  Banerjee),  graduated  3/97,  presently  working  for  Intel  Corporation. 

•  JOHN  HOLM,  Ph.D.  Student,  Electrical  and  Computer  Engineering,  University  of  Illinois  (Advisor: 
P.  Banerjee),  graduated  4/97,  presently  working  for  Intel  Corporation. 

•  VENKAT  KRISHNASWAMY,  Ph.D.  Student,  Computer  Science,  University  of  Illinois  (Advisor:  P. 
Banerjee),  graduated  4/97,  presently  working  for  Intel  Corporation. 

•  DILIP  KRISHNASWAMY,  Ph.D.  Student,  Electrical  and  Computer  Engineering,  University  of  Illinois 
(Advisor:  P.  Banerjee),  graduated  7/97,  presently  working  for  Intel  Corporation. 

•  ZHAOYUN  JASON  XING,  Ph.D.  Student,  Computer  Science,  University  of  Illinois  (Advisor:  P.  Baner¬ 
jee),  greiduated  7/97,  presently  working  for  SUN  Microsystems. 

D.2.  Northwestern  University 

The  subcontract  to  Northwestern  University  supported  the  following  Ph.D./M.S.  Students  in  the  ECE  de¬ 
partment  at  Northwestern. 

•  DHRUVA  RANJAN  CHAKRABARTI  Ph.D.  Student,  Electrical  and  Computer  Engineering,  North¬ 
western  University,  (Advisor:  P.  Banerjee),  Expected  graduation  8/99. 

•  YANHONG  YUAN,  Ph.D.  Student,  Electrical  and  Computer  Engineering,  Northwestern  University, 
(Advisor  P.  Banerjee),  Expected  graduation  8/99. 

•  JER-SHENG  CHEN,  Ph.D.  Student,  Electrical  and  Computer  Engineering,  Northwestern  University, 
(Advisor  P.  Banerjee),  Expected  graduation  8/00. 

•  JIWOONG  VICTOR  KIM,  M.S./Ph.D.  Student,  Graduated  M.S.  May  1998,  Electrical  and  Computer 
Engineering,  Northwestern  University,  (Advisor  P.  Banerjee),  Expected  graduation  8/00. 

•  PRAMOD  JOISHA,  M.S./Ph.D.  Student,  Electrical  and  Computer  Engineering,  Northwestern  Uni¬ 
versity,  (Advisor  P.  Banerjee),  Expected  grciduation  8/01. 
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D.3.  Subcontract  to  Sierra  Vista  Research 

•  Dr.  STEVEN  PARKES,  Sierra  Vista  Research,  San  Jose,  CA. 

D.4.  Subcontract  to  Cadence  Design  Systems 

•  DR.  EDWIN  PETRUS,  Cadence  Design  Systems 

D.5.  Subcontract  to  LSI  Logic 

•  Dr.  SUNGHO  KIM,  LSI  Logic 


E.  HONORS  RECEIVED 

•  Prof.  Banerjee  was  awarded  the  Best  Paper  Award  at  the  IEEE  VLSI  Test  Symposium,  Monterey,  CA, 
April  1998. 

•  Prof.  Banerjee  received  the  1996  Frederick  Emmons  Terman  Award  from  the  ASEE’s  Electrical  Engi¬ 
neering  Division,  sponsored  by  Hewlett-Packard  Company,  presented  to  an  Outstanding  Young  Elec¬ 
trical  Engineering  Educator,  for  publishing  the  textbook  “Parallel  Algorithms  for  VLSI  CAD”. 

•  Prof.  Banerjee  became  Fellow  of  IEEE,  1995 

•  Prof.  Banerjee  was  invited  to  give  the  Keynote  Address  at  the  International  Conference  on  Parallel 
and  Distributed  Systems,  New  Orleans,  in  October  1997. 

•  Prof.  Banerjee  was  hired  as  Walter  P.  Muprhy  Chaired  Professor  of  Electrical  and  Computer  Engi¬ 
neering,  Northwestern  University, m  Sep.  1,  1996. 


